AITopics | different modality

OmniSegmentor: AFlexible Multi-Modal Learning Framework for Semantic Segmentation

Neural Information Processing SystemsJun-22-2026, 20:45:40 GMT

Recent research on representation learning has proved the merits of multi-modal clues for robust semantic segmentation. Nevertheless, a flexible pretrain-andfinetune pipeline for multiple visual modalities remains unexplored. In this paper, we propose a novel multi-modal learning framework, termed OmniSegmentor.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

LBMKGC: Large Model-Driven Balanced Multimodal Knowledge Graph Completion

Neural Information Processing SystemsJun-21-2026, 00:48:11 GMT

Multi-modal Knowledge Graph Completion (MMKGC) aims to predict missing entities, relations, or attributes in knowledge graphs by collaboratively modeling the triple structure and multimodal information (e.g., text, images, videos) associated with entities.

artificial intelligence, information, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (0.68)
North America > Canada (0.68)
North America > United States > Louisiana (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

CMoB: Modality Valuation via Causal Effect for Balanced Multimodal Learning

Neural Information Processing SystemsJun-18-2026, 15:17:30 GMT

Existing early and late fusion frameworks in multimodal learning are confronted with the fundamental challenge of modality imbalance, wherein disparities in representational capacities induce inter-modal competition during training. Current research methodologies primarily rely on modality-level contribution assessments to measure gaps in representational capabilities and enhance poorly learned modalities, overlooking the dynamic variations of modality contributions across individual samples. To address this, we propose a Causal-aware Modality valuation approach for Balanced multimodal learning (CMoB). We define a benefit function based on Shannon's theory of informational uncertainty to evaluate the changes in the importance of samples across different stages of multimodal training. Inspired by human cognitive science, we propose a causal-aware modality contribution quantification method from a causal perspective to capture fine-grained changes in modality contribution degrees within samples. In the iterative training of multimodal learning, we develop targeted modal enhancement strategies that dynamically select and optimize modalities based on real-time evaluation of their contribution variations across training samples. Our method enhances the discriminative ability of key modalities and the learning capacity of weak modalities while achieving fine-grained balance in multimodal learning. Extensive experiments on benchmark multimodal datasets and multimodal frameworks demonstrate the superiority of our CMoB approach for balanced multimodal learning.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Education (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Generalized Contrastive Learning for Universal Retrieval

Neural Information Processing SystemsJun-17-2026, 04:30:15 GMT

Despite their consistent performance improvements, cross-modal retrieval models (e.g., CLIP) show degraded performances with retrieving keys composed of fused image-text modality (e.g., Wikipedia pages with both images and text). To address this critical challenge, multimodal retrieval has been recently explored to develop a unified single retrieval model capable of retrieving keys across diverse modality combinations. A common approach involves constructing new composed sets of image-text triplets (e.g., retrieving a pair of image and text given a query image). However, such an approach requires careful curation to ensure the dataset quality and fails to generalize to unseen modality combinations. To overcome these limitations, this paper proposes Generalized Contrastive Learning (GCL), a novel loss formulation that improves multimodal retrieval performance without the burdensome need for new dataset curation. Specifically, GCL operates by enforcing contrastive learning across all modalities within a mini-batch, utilizing existing image-caption paired datasets to learn a unified representation space. We demonstrate the effectiveness of GCL by showing consistent performance improvements on off-the-shelf multimodal retrieval models (e.g.VISTA, CLIP, and TinyCLIP) using the M-BEIR, MMEB, and CoVR benchmarks.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Dynamic Masking and Auxiliary Hash Learning for Enhanced Cross-Modal Retrieval

Neural Information Processing SystemsJun-16-2026, 03:37:52 GMT

The demand for multimodal data processing drives the development of information technology. Cross-modal hash retrieval has attracted much attention because it can overcome modal differences and achieve efficient retrieval, and has shown great application potential in many practical scenarios. Existing cross-modal hashing methods have difficulties in fully capturing the semantic information of different modal data, which leads to a significant semantic gap between modalities. Moreover, these methods often ignore the importance differences of channels, and due to the limitation of a single goal, the matching effect between hash codes is also affected to a certain extent, thus facing many challenges. To address these issues, we propose a Dynamic Masking and Auxiliary Hash Learning (AHLR) method for enhanced cross-modal retrieval.

artificial intelligence, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America > United States > New York (0.28)
North America > Canada (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

LBMKGC: Large Model-Driven Balanced Multimodal Knowledge Graph Completion

Neural Information Processing SystemsJun-13-2026, 15:08:32 GMT

Multi-modal Knowledge Graph Completion (MMKGC) aims to predict missing entities, relations, or attributes in knowledge graphs by collaboratively modeling the triple structure and multimodal information (e.g., text, images, videos) associated with entities.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.83)

Add feedback

Self-Supervised MultiModal Versatile Networks

Neural Information Processing SystemsApr-30-2026, 19:38:33 GMT

Videos are a rich source of multi-modal supervision. In this work, we learn representations using self-supervision by leveraging three modalities naturally present in videos: visual, audio and language streams. To this end, we introduce the notion of a multimodal versatile network - a network that can ingest multiple modalities and whose representations enable downstream tasks in multiple modalities. In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding. Driven by versatility, we also introduce a novel process of deflation, so that the networks can be effortlessly applied to the visual data in the form of video or a static image. We demonstrate how such networks trained on large collections of unlabelled video data can be applied on video, video-text, image and audio tasks. Equipped with these representations, we obtain state-of-the-art performance on multiple challenging benchmarks including UCF101, HMDB51, Kinetics600, AudioSet and ESC-50 when compared to previous self-supervised work. Our models are publicly available .

Add feedback

f88bec15cc4cb56b432ee040bb63f94f-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 08:40:58 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(3 more...)

Add feedback

Cross-Modality Perturbation Synergy Attack for Person Re-identification

Neural Information Processing SystemsMar-19-2026, 08:00:20 GMT

In recent years, there has been significant research focusing on addressing security concerns in single-modal person re-identification (ReID) systems that are based on RGB images. However, the safety of cross-modality scenarios, which are more commonly encountered in practical applications involving images captured by infrared cameras, has not received adequate attention. The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities. For instance, infrared images are typically grayscale, unlike visible images that contain color information. Existing attack methods have primarily focused on the characteristics of the visible image modality, overlooking the features of other modalities and the variations in data distribution among different modalities. This oversight can potentially undermine the effectiveness of these methods in image retrieval across diverse modalities. This study represents the first exploration into the security of cross-modality ReID models and proposes a universal perturbation attack specifically designed for cross-modality ReID.

artificial intelligence, modality, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback